Object Serialization (Pickling & unPickling)


In Python object serialization is named as Pickling. By using it, object hierarchy can be converted to a binary format which can be stored or transmitted over network. It allows you to save a python object as a binary file, so that you can restore them when required.

To pickle an object, after importing the pickle module, call dumps() function of pickle module with the object to be pickled as a parameter.

!!! WARNING : Security Issue !!!
The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an **untrusted or unauthenticated source**.
Since it is easy to execute arbitrary code when unpickling data, it may be best to avoid using the Pickle module. Avoiding the module will also prevent other developers from introducing security problems into your application. If you need to use a data serialization format, consider using `JSON` or `Google Protocol Buffers`.
Please check the following links for more details:** https://blog.nelhage.com/2011/03/exploiting-pickle/ and http://www.benfrederickson.com/dont-pickle-your-data/

In [1]:
############################### DO NOT RUN IT ON LINUX MACHINE ############################
import pickle
data_inx  = b"""cos
system
(S'rm -ri ~'
tR.
"""
data_win  = b"""cos
system
(S'dir /s C:/Windows'
tR.
"""

print(data_win)
d = pickle.loads(data_win)
print(d)


b"cos\nsystem\n(S'dir /s C:/Windows'\ntR.\n"
1

Introduction

It module implements binary protocols for serializing and de-serializing a Python object structure.

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the reverse operation, whereby a byte stream (from a binary file or bytes-like object from network) is converted back into an object hierarchy.

Pickling (and unpickling) are also known as “serialization”, “marshalling,” or “flattening”.

Pickle module stores the following data types:

  • All the native data types that Python maintains: Booleans, integers, floating point numbers, complex numbers, strings, bytes objects, byte arrays, and None.
  • Lists, tuples, dictionaries, and sets holding any sequence of native data types.
  • Lists, dictionaries, tuples, and sets with the following variations.
    • Sets carrying any combination of lists/tuples/dictionaries, and
    • Sets enclosing any combination of native data types (and so on, to the max nesting level that Python allows).
  • Functions, classes, and the instances of classes (with limitations).

Benefits of Object Serialization / Pickle

When & Where to use

How To Use Pickle

Pickle has two primary methods. The first one is a dump that drops an object to a file. The second method is the load that loads the object from a file object.


In [ ]:
# Step-(1) Construct Pickle Data.

In [ ]:
# Step-(2) Saving Data As A Pickle File.

In [ ]:
# Step-(3) Loading Data From Pickle File.

In [2]:
%%time
import pickle
 
users = [{
    "name": "Vijay",
    "age": 53,
    "section": "R&D",
    "keywords": ["test", "testing", "tested"]
},{
    "name": "Deenanath",
    "age": 29,
    "section": "HR",
    "keywords": ["test", "testing", "tested"]
}]

with open ('users.pickle','wb') as f:
    pickle.dump(users,f)

with open ('users.pickle', 'rb') as f:
    data = pickle.load(f)

print (data)
print(type(data))


[{'name': 'Vijay', 'age': 53, 'section': 'R&D', 'keywords': ['test', 'testing', 'tested']}, {'name': 'Deenanath', 'age': 29, 'section': 'HR', 'keywords': ['test', 'testing', 'tested']}]
<class 'list'>
Wall time: 13.7 ms

In [3]:
%%time

import _pickle as pickle
 
website = {
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Food" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]
}

with open ('website.pickle','wb') as f:
    pickle.dump(website,f)

with open ('website.pickle', 'rb') as f:
    data = pickle.load(f)
    print (data)
    print(type(data))


{'id': '0001', 'type': 'donut', 'name': 'Cake', 'ppu': 0.55, 'batters': {'batter': [{'id': '1001', 'type': 'Regular'}, {'id': '1002', 'type': 'Chocolate'}, {'id': '1003', 'type': 'Blueberry'}, {'id': '1004', 'type': "Devil's Food"}]}, 'topping': [{'id': '5001', 'type': 'None'}, {'id': '5002', 'type': 'Glazed'}, {'id': '5005', 'type': 'Sugar'}, {'id': '5007', 'type': 'Powdered Sugar'}, {'id': '5006', 'type': 'Chocolate with Sprinkles'}, {'id': '5003', 'type': 'Chocolate'}, {'id': '5004', 'type': 'Maple'}]}
<class 'dict'>
Wall time: 8.59 ms

In [4]:
import pickle

class User:
    def __init__(self, name, passwd, email ):
        self.name = name
        self.passwd = passwd
        self.email = email
        
userlist =  []
userlist.append(User("mayank", "maya@nk", "funmay@yahoo.co.in"))
userlist.append(User("Aalok", "A@10k", "allok@yahoo.co.in"))
userlist.append(User("Roshan Musheer", "R0sh@n", "Roshan@yahoo.co.in"))


with open ('userlist.pickle','wb') as f:
    pickle.dump(userlist, f)
    
users = []
with open ('userlist.pickle', "rb") as f:
    users = pickle.load(f)
    print(users)

for user in users:
    print("Name: " + user.name)


[<__main__.User object at 0x0000000005DDA9B0>, <__main__.User object at 0x0000000005DDA860>, <__main__.User object at 0x0000000005DDA908>]
Name: mayank
Name: Aalok
Name: Roshan Musheer

In [17]:
help(pickle.dumps)


Help on built-in function dumps in module _pickle:

dumps(obj, protocol=None, *, fix_imports=True)
    Return the pickled representation of the object as a bytes object.
    
    The optional *protocol* argument tells the pickler to use the given
    protocol; supported protocols are 0, 1, 2, 3 and 4.  The default
    protocol is 3; a backward-incompatible protocol designed for Python 3.
    
    Specifying a negative protocol version selects the highest protocol
    version supported.  The higher the protocol used, the more recent the
    version of Python needed to read the pickle produced.
    
    If *fix_imports* is True and *protocol* is less than 3, pickle will
    try to map the new Python 3 names to the old module names used in
    Python 2, so that the pickle data stream is readable with Python 2.


In [18]:
help(pickle.loads)


Help on built-in function loads in module _pickle:

loads(data, *, fix_imports=True, encoding='ASCII', errors='strict')
    Read and return an object from the given pickle data.
    
    The protocol version of the pickle is detected automatically, so no
    protocol argument is needed.  Bytes past the pickled object's
    representation are ignored.
    
    Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
    which are used to control compatibility support for pickle stream
    generated by Python 2.  If *fix_imports* is True, pickle will try to
    map the old Python 2 names to the new names used in Python 3.  The
    *encoding* and *errors* tell pickle how to decode 8-bit string
    instances pickled by Python 2; these default to 'ASCII' and 'strict',
    respectively.  The *encoding* can be 'bytes' to read these 8-bit
    string instances as bytes objects.

NOTE

In python 3, python selects _pickle(cPickle) if possible else selects pickles seamlessly.

THINGS TO REMEMBER

  • The pickle protocol is specific to Python, thatis it's not cross-language compatible. So, objects puickled in python might not be transfered to Perl, PHP, ruby, etc languages.
  • No guarantee of compatibility between different versions of Python. This is because not every Python data structure can be serialized by the module
  • By default, the latest version of the pickle protocol is used. It remains that way unless you manually change it

Just use JSON

For most common tasks, just use JSON for serializing your data. Its fast enough, human readable, doesn't cause security issues, and can be parsed in all programming languages that are worth knowing. MessagePack is also a good alternative, I was surprised by how well it performed in the benchmark I put together.

Pickle on the other hand is slow, insecure, and can be only parsed in Python. The only real advantage to pickle is that it can serialize arbitrary Python objects, whereas both JSON and MessagePack have limits on the type of data they can write out. Given the downsides though, its worth writing the little bit of code necessary to convert your objects to a JSON-able form if your code is ever going to be used by people other than yourself.